Privacy Preserving Data Mining For Horizontally Distributed Medical Data Analysis

نویسندگان

  • Yunmei Lu
  • YUNMEI LU
چکیده

To build reliable prediction models and identify useful patterns, assembling data sets from databases maintained by different sources such as hospitals becomes increasingly common; however, it might divulge sensitive information about individuals and thus leads to increased concerns about privacy, which in turn prevents different parties from sharing information. Privacy Preserving Distributed Data Mining (PPDDM) provides a means to address this issue without accessing actual data values to avoid the disclosure of information beyond the final result. In recent years, a number of state-of-the-art PPDDM approaches have been developed, most of which are based on Secure Multiparty Computation (SMC). SMC requires expensive communication cost and sophisticated secure computation. Besides, the mining progress is inevitable to slow down due to the increasing volume of the aggregated data. In this work, a new framework named Privacy-Aware Non-linear SVM (PAN-SVM) is proposed to build a PPDDM model from multiple data sources. PAN-SVM employs the Secure Sum Protocol to protect privacy at the bottom layer, and reduces the complex communication and computation via Nystrom matrix approximation and Eigen decomposition methods at the medium layer. The top layer of PAN-SVM speeds up the whole algorithm for large scale datasets. Based on the proposed framework of PAN-SVM, a Privacy Preserving Multi-class Classifier is built, and the experimental results on several benchmark datasets and microarray datasets show its abilities to improve classification accuracy compared with a regular SVM. In addition, two Privacy Preserving Feature Selection methods are also proposed based on PAN-SVM, and tested by using benchmark data and real world data. PAN-SVM does not depend on a trusted third party; all participants collaborate equally. Many experimental results show that PAN-SVM can not only effectively solve the problem of collaborative privacy-preserving data mining by building non-linear classification rules, but also significantly improve the performance of built classifiers. INDEX WORDS: Privacy preserving, Distributed data mining, Classification, Feature selection, Support Vector Machine, Kernel matrix approximation and decomposition PRIVACY PRESERVING DATA MINING FOR HORIZONTALLY DISTRIBUTED MEDICAL DATA ANALYSIS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy Preserving ID3 over Horizontally, Vertically and Grid Partitioned Data

We consider privacy preserving decision tree induction via ID3 in the case where the training data is horizontally or vertically distributed. Furthermore, we consider the same problem in the case where the data is both horizontally and vertically distributed, a situation we refer to as grid partitioned data. We give an algorithm for privacy preserving ID3 over horizontally partitioned data invo...

متن کامل

Privacy Preserving Association Rule Mining in Horizontally Partitioned Databases Using Cryptography Techniques

Data mining techniques are used to discover hidden information from large databases. Among many data mining techniques, association rule mining is receiving more attention to the researchers to find correlations between items or items sets efficiently. In distributed database environment, the way the data is distributed plays an important role in the problem definition. The data may be distribu...

متن کامل

Privacy - Preserving Distributed Data Mining and Processing on Horizontally Partitioned Data

Kantarcıoğlu, Murat. Ph.D., Purdue University, August, 2005. Privacy-Preserving Distributed Data Mining and Processing on Horizontally Partitioned Data. Major Professor: Christopher W. Clifton. Data mining can extract important knowledge from large data collections, but sometimes these collections are split among various parties. Data warehousing, bringing data from multiple sources under a sin...

متن کامل

Privacy-Preserving Predictive Models for Lung Cancer Survival Analysis

Privacy-preserving data mining (PPDM) is a recent emergent research area that deals with the incorporation of privacy preserving concerns to data mining techniques. We consider a real clinical setting where the data is horizontally distributed among different institutions. Each one of the medical institutions involved in this work provides a database containing a subset of patients. There is re...

متن کامل

Comprehensive Research on Privacy Preserving Emphasizing on Distributed Clustering

Often, the information is sensitive or private in nature and these sensitive data when mined violates the privacy of the individuals. Privacy preserving data mining (PPDM) mines the data but intends to preserve the privacy of susceptible data without ever actually seeing it. This paper recaps the important techniques in PPDM like anonymization, perturbation and cryptography. Nowadays, data mini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016